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Purpose: Since the introduction of the Framingham Risk Score (FRS), numerous versions 
of coronary heart disease (CHD) prediction models have claimed improvement over the FRS. 
Tzoulaki et al challenged the validity of these claims by illustrating methodology deficien- 
cies among the studies. However, the question remains: Is it possible to create a new CHD 
model that is better than FRS while overcoming the noted deficiencies? To address this, 
a new CHD prediction model was developed by integrating additional risk factors, using a 
novel modeling process. 

Methods: Using the National Health Nutritional Examination Survey III data set with CHD- 
specific mortality outcomes and the Atherosclerosis Risk in Communities data set with CHD 
incidence outcomes, two FRSs (FRSvl from 1998 and FRSv2 from National Cholesterol 
Education Program Adult Treatment Panel III), along with an additional risk score in which the 
high density lipoprotein (HDL) component of FRSvl was ignored (FRSHDL), were compared 
with a new CHD model (NEW-CHD). This new model contains seven elements: the original 
Framingham equation, FRSvl, and six additional risk factors. Discrimination, calibration, and 
reclassification improvements all were assessed among models. 

Results: Discrimination was improved for NEW-CHD in both cohorts when compared with 
FRSvl and FRSv2 (P<0.05) and was similar in magnitude to the improvement of FRSvl over 
FRSHDL. NEW-CHD had a similar calibration to FRSv2 and was improved over FRSvl. Net 
reclassification for NEW-CHD was substantially improved over both FRSvl and FRSv2, for 
both cohorts, and was similar in magnitude to the improvement of FRSvl over FRSHDL. 
Conclusion: While overcoming several methodology deficiencies reported by earlier authors, 
the NEW-CHD model improved CHD risk assessment when compared with the FRSs, compa- 
rable to the improvement of adding HDL to the FRS. 

Keywords: risk assessment, atherosclerotic risk in communities, NHANES, epidemiology 

Introduction 

Widely regarded as the gold standard in coronary heart disease (CHD) risk assessment 
tools, the Framingham Risk Score (FRS) was developed from the Framingham Heart 
Study' and has been validated in multiple populations.^^ However, several widely 
accepted risk factors for heart disease risk were not included in the original model. 
These shortcomings contribute to the much-debated topic of whether additional risk 
factors could materially improve the FRS in assessing CHD risk. 

Tzoulaki et al recently compiled a systematic review of studies claiming improve- 
ment over the FRS.^'' The review concluded that most claims were questionable 
because of various deficiencies in the validation study designs, including improper 
data use (eg, using the same data to develop and validate the model); incorrect use of 
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FRS by including individuals with previous CHD diagnoses; 
incomplete use of tests for discrimination, calibration, and 
reclassification; and publication bias. 

The objective of the present study was to demonstrate 
that FRS can indeed be improved by developing a new CHD 
model that combines the original Framingham equation with 
six additional literature-derived risk factors, using a novel 
modeling process.' The accuracy of this NEW-CHD model 
was then compared with the FRS while striving to avoid the 
methodology deficiencies outlined by Tzoulaki et al.' 

Material and methods 

Models and data sets 
FRS 

Two versions of the FRS are currently in use. The first 
version (FRSvl), published in 1998, includes diabetes 
diagnosis status and evaluates CHD risk among individuals 
who are free of CHD at presentation.'-^ The second version 
(FRSv2) was published within the National Cholesterol 
Education Program Adult Treatment Panel III guideline 
and includes hypertension medication status among its 
inputs but excludes the diabetes diagnosis, thereby mak- 
ing FRSv2 applicable only to individuals without diabetes 
or CHD." 

For the current research, a modified version of FRSv 1 was 
also developed by excluding high density lipoprotein (HDL) 
(FRSHDL). This abbreviated model was used to compare the 
incremental accuracy gain of the NEW-CHD model to any 
difference observed between FRS and FRSHDL. 

NEW-CHD model 

The NEW-CHD model was developed using an alternative 
model-building method called synthesis analysis. Synthesis 
analysis is used to develop comprehensive risk-assessment 
models by combining literature-derived risk factors with 
partially adjusted relative risks.'" " Details of synthesis 
analysis and its statistical validation have been reported 
elsewhere. 

The NEW-CHD model was created by combining the 
original Framingham risk equation (FRE),'^ which also was 
the basis for FRSvl, with additional literature-derived risk 
factors, including family history of CHD (father/brother 
with CHD before age 60 years or mother/sister before age 
65 years), physical exercise level (lower, equal, or higher 
when compared with peers), body mass index (kg/m^), serum 
albumin, apolipoproteinA, and plasma fibrinogen. The selec- 
tion of these variables was based on their availability in both 
validation data sets. 



Population data 

Two longitudinal data sets were used in the present study: 
the Atherosclerosis Risk in Communities (ARIC) study and 
the Third National Health Nutritional Examination Survey 
(NHANES III). ARIC is a prospective epidemiologic study 
conducted in four US communities, and CHD incidence 
was recorded during the follow-up interval. Details of the 
ARIC and NHANES study designs have been previously 
described.'^ '" The NHANES III data were supplemented 
with CHD-specific mortality data at follow-up. 

To allow for appropriate comparisons among the three 
models (FRSvl, FRSv2, and NEW-CHD), all individu- 
als with CHD or diabetes at baseline were excluded from 
analysis. Use of hypertension medication, a FRSv2 input, 
was the only variable not included in both data sets. 

The ARIC study data set contained 13,657 individuals, 
aged 45 to 64 years at baseline. In the 1 0 years after baseline 
evaluation, 759 CHD cases, defined as clinically diagnosed 
myocardial infarction, electrocardiogram diagnosis of myo- 
cardial infarction, fatal CHD event, or receipt of CHD-related 
clinical procedures, were reported. 

NHANES III subjects were limited to those aged 40 to 
70 years (n=5,706), the appropriate ages for both NEW-CHD 
and FRS applications. Over an average follow-up interval 
of 14 years, 88 CHD-associated deaths were recorded, as 
documented by an International Statistical Classification of 
Diseases and Related Health Problems, Tenth Revision, cause 
ofdeath code of 059-061. 

Statistical analysis 

The performance features of the risk assessment models were 
evaluated using discrimination, calibration, and reclassifica- 
tion indices. Discrimination was evaluated by the c-statistic 
or area under the receiver operating characteristic curve. 
With several methods being appropriate for calculating the 
c-statistic,""" the current study used the method described 
by D'Agostino et al" and DeLong et al.'^ Among selected 
models, c-statistic comparisons were made by evaluating the 
pairwise c-statistic difference. The 95% confidence interval 
of the difference was derived from bootstrapping 100 data 
sets that were randomly selected from the study data. 

Calibration is defined as the closeness between observed 
incidence and predicted probability. To evaluate a model's 
calibration, data are typically divided into deciles accord- 
ing to predicted risk; subsequently, the closeness between 
Kaplan-Meier-derived incidence and average predicted prob- 
abilities from a given model are tested by Hosmer-Lemeshow 
chi-square statistics. In general, a chi-square greater than 20 is 
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evidence of lack of fit." When a model is evaluated in a data 
set other than the data set from which it was developed, lack 
of fit may be a result of differences in outcome definition. To 
address this problem, the model can be recalibrated, which 
forces the overall predicted probability equal to the overall 
incidence. Only ARIC contained a CHD-incidence outcome; 
therefore, calibration assessments were performed in this 
data set only. 

Reclassification measures the change in classification 
between a "new" model and an "old" model.'^^^' In clini- 
cal practice, this change is very important, as different risk 
classifications could prompt different treatments. If the new 
model improves classification over the old model, one would 
expect the new model to yield more "correct" reclassifica- 
tions than "incorrect" reclassifications. This evaluation is 
known as net reclassification index (NRI). It is well known 
that NRI is dependent on the definition of classification 
thresholds. To overcome this issue, Pencina et al introduced a 
new parameter, classless NRI, which is no longer dependent 
on any arbitrarily defined risk classes." In the present study, 
both class NRI and classless NRI were used. When class NRI 
was calculated, the ATP III guideline-defined 10-year CHD 
risk thresholds of lower than 10%, 10%-20%, and higher 
than 20% were used. 

In addition, comparisons between the NEW-CHD and 
FRSvl were made in the ARIC data, using a Cox propor- 
tional hazards model. This comparison was chosen because 
FRSvl was directly derived from the original PRE, and 
PRE is the foundation for the NEW-CHD model. The Cox 
model included eight independent variables: PRSvl and 
seven NEW-CHD elements (the PRE and six additional risk 
factors). This model is designed to evaluate which elements 
of the NEW-CHD model contribute most to the incremental 
accuracy of NEW-CHD over PRSvl. 

All analyses were performed using SAS 9.3 (SAS 
Institute Inc., Cary, NC, USA). 

Results 

As shown in Table 1 , the NEW-CHD model yielded higher 
c-statistics than PRSvl and PRSv2 in both data sets and for 
both genders. On evaluating the differences among pairwise 
c-statistics, the discrimination gains with NEW-CHD were 
higher than those observed between PRSvl and PRSHDL, 
and the results were consistent between the two data sets and 
between genders (Table 2). 

As shown in Table 3, the recalibration effect appears 
to be very dramatic for PRSvl and PRSHDL; in contrast, 
recalibration brought both the NEW-CHD and PRSv2 models 



Table I C-statistics of the four selected models assessed by 
gender in the National Health Nutritional Examination Survey III 
and Atherosclerosis Risk in Communities 





NEW-CHD 

model 


FRSv2 


FRSvl 


PRSHDL 


National Health 










Nutritional Examination 










Survey III 










Male 


0.726 


0.663 


0.686 


0.674 


Female 


0.779 


0.728 


0.742 


0.772 


Total 


0.751 


0.706 


0.717 


0.713 


Atherosclerosis 










Risk in Communities 










Male 


0.693 


0.615 


0.616 


0.575 


Female 


0.760 


0.691 


0.709 


0.672 


Total 


0.764 


0.723 


0.674 


0.603 



Abbreviations: NEW-CHD, New Coronary Heart Disease model; FRSv I , Version 
I of the Framingham Risk Score; FRSv2, Version 2 of the Framingham Risk Score; 
FRSHDL, Version I of the Framingham Risk Score without HDL; HDL, high density 
lipoprotein. 



near the Hosmer-Lemeshow chi-square accepted threshold 
of 20, which is indicative of no lack of fit. 

NRI values between selected model pairs appear to be 
very similar to the pairwise c-statistics difference (Table 4). 
Resuhs demonstrate that the NRI values of NEW-CHD over 
PRSvl and PRSv2 are all statistically significant. 

In a Cox model in which PRSvl and seven NEW-CHD 
model elements were treated as covariates, PRE and three 
of the six additional risk factors significantly contributed to 
the CHD risk prediction, even while adjusting for PRSvl 
(Table 5), indicating that the PRE still contains additional pre- 
dictive power beyond that captured in PRS v 1 . These findings 
illustrate that the NEW-CHD model outperforms PRSvl. 

Discussion 

In the present study, a new CHD risk assessment model 
(NEW-CHD), developed by integrating six additional 
literature-derived variables into the original PRE, was dem- 
onstrated to outperform the PRSvl and PRSv2 in discrimina- 
tion and reclassification. After recalibration, the NEW-CHD 
model outperformed PRSvl and was similar to PRSv2 in 
calibration. Notably, the improvement of the NEW-CHD 
model over the PRS was achieved despite three of the six 
additional variables in the NEW-CHD model making no sig- 
nificant contribution to the overall risk assessment power. 

Although emerging risk factors are known to contribute to 
CHD prediction,^^ it remains uncertain whether any CHD 
model, with additional risk factors incorporated, could signifi- 
cantly outperform PRS. In his research, Tzoulaki et al cited 
three deficiencies among studies reporting claimed benefit over 
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Table 2 Differences among pairwise c-statistics and 


corresponding 95% confidence intervals between selected model pairs by gender 


in National Health Nutritional Examination Survey II 


1 and Atherosclerosis 


Risk in Communities 






NEW-CHD versus FRSv2 


NEW-CHD versus FRSvl 


FRSvl versus FRSHDL 


DiffGrcncc 95% confidGncG 


Difference 


95% confidence 


Difference 


95% confidence 






interval 




interval 


National Health Nutritional Examination Survey III 










Male 0.06 1 0.056-0.067 


0.036 


0.032-0.040 


0.013 


0.009-0.018 


Female 0.052 0.043-0.060 


0.032 


0.023-0.041 


-0.026 


-0.029 to 0.022 


Total 0.04 1 0.036-0.046 


0.034 


0.029-0.038 


0.004 


0.001-0.008 


Atherosclerosis Risk in Communities 










Male 0.079 0.077-0.080 


0.069 


0.068-0.071 


0.037 


0.036-0.038 


Female 0.09! 0.089-0.094 


0.049 


0.047-0.05 1 


0.033 


0.032-0.035 


Total 0.04! 0.039-0.042 


0.090 


0.088-0.090 


0.071 


0.071-0.073 



Abbreviations: NEW-CHD, new coronary heart disease model; FRSvl, Version I of the Framingham Risk Score; FRSv2, Version 2 of the Framingham Risk Score; HDL, 
high density lipoprotein. 



FRS: the population or the CHD outcome was not properly 
selected; prediction accuracy properties, including discrimina- 
tion, calibration, and classification, were not comprehensively 
tested; and researchers who validated the models were typically 
the same authors who developed the models.^ '' 

In the present study, special consideration was given 
to address each of Tzoulaki's critiques. First, the NEW- 
CHD model was not empirically derived from either ARIC 
or NHANES III data but, rather, was constructed using 
literature-derived information. The finding that three risk fac- 
tors integrated into the NEW-CHD model did not significantly 
contribute to the prediction in the ARIC data is indicative of 
external validation; furthermore, consistent observations in 
different data sources add to the external validation strength 
of NEW-CHD. 

Although NEW-CHD does not have restrictions on the 
eligible population, to make data sets applicable with FRSvl 
and FRSv2, validation was performed among individuals 
free of CHD and diabetes at baseline. It is important to note 
that including these individuals would increase various accu- 
racy indices (particularly discrimination) of the NEW-CHD 
model; however, NEW-CHD would then be incomparable 
with FRS. 



Table 3 Hosmer-Lemeshow chi-square values of the lack of fit 
test between predicted and observed coronary heart disease risk 
for selected models in Atherosclerosis Risk in Communities 





NEW-CHD 


FRSv2 


FRSvl 


FRSHDL 




model 








Before recalibration 


47.7 


28.5 


173 


138 


After recalibration 


19.7 


20.8 


49.0 


25.6 



Abbreviations: NEW-CHD, New Coronary Heart Disease model; FRSv I , Version 
I of the Framingham Risk Score; FRSv2, Version 2 of the Framingham Risk Score; 
FRSHDL, Version I of the Framingham Risk Score without HDL; HDL, high density 
lipoprotein. 



Second, the present study assessed various model 
accuracy indices including discrimination, calibration, and 
reclassification. Although not all statistical indices were 
included in the present study, those statistical terms, widely 
recognized as important in model accuracy assessment, 
were evaluated to address Tzoulaki's predictive accuracy 
critique. 

For Tzoulaki's third issue, although it can be reasonably 
argued that conflicting interests and potential publication bias 
do play a role in validation studies, it should not be assumed 
that every researcher who develops a model will always bias 
its validation. The sole purpose of developing the NEW-CHD 
model was to demonstrate that FRS can indeed be improved. 
The risk factors added to the NEW-CHD model were only 
limited by their availability in the data sets. 

Results from the current study illustrate that recalibration 
was necessary before a true comparison of model calibra- 
tion could be made. Although none of the models scored 
extremely well, NEW-CHD was the only model with a chi- 
square of less than 20, indicative of adequate model fit. 

A higher positive NRI indicates that the NEW-CHD 
model correctly identified patients who were truly at higher 
and lower risk compared with the FRS models. Not only 
did the present study demonstrate that FRS can indeed be 
improved, it also showed that the magnitude of the accuracy 
gain can be substantial. When evaluating discrimination and 
reclassification, the NEW-CHD improvements over FRSvl 
and FRSv2 were similar to or greater than the accuracy 
gain of FRSvl over FRSHDL. Thus, these findings yield a 
simplified interpretation that NEW-CHD added to the dis- 
crimination of FRS equally or more than what HDL added 
to FRSHDL. 

Although successfully validating a NEW-CHD model 
shown to outperform standard FRS, one limitation of the 
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Table 4 Class and classless net reclassification Improvement and corresponding 95% confidence Intervals between selected models by 


gender in National Health Nutritional Examination Survey 1 


II and AtheroscI 


lerosis Risk in Communities 




NEW-CHD model vs FRSv2 


NEW-CHD model versus FRSvl 


FRSvl vs FRSHDL 


NRI 9^% rnnfidpnrp 


NRI 


95% confidence 


NRI 


95% confidence 


interval 




interval 




interval 


National Health Nutritional Examination Survey III (class NRI) 










Male 0. 1 0 0.07-0. 1 3 


0.56 


0.53-0.59 


0.10 


0.08-0. 1 2 


Female 0.20 0. 1 8-0.22 


0.01 


0.00-0.03 


0.00 


-0.02 to 0.03 


Total 0.16 0.14-0.18 


0.26 


0.24-0.28 


0.07 


0.06-0.09 


National Health Nutritional Examination Survey III (classless NRI) 










Male 0.60 0.55-0.65 


1.12 


1 .07-0. 1 7 


0.74 


0.69-0.78 


Female 0.55 0.52-0.59 


0.78 


0.75-0.8 1 


0.35 


0.31-0.38 


Total 0.60 0.57-0.63 


0.98 


0.95-1.01 


0.53 


0.50-0.56 


Atherosclerosis Risk in Communities (class NRI) 










Male O.ll 0.09-0.12 


0.67 


0.66-0.68 


0.17 


0.16-0.17 


c 1 j-i 1 o n 1 o n 1 n 

Female 0.18 0.18-0.19 


0.13 


0.12-0.13 


0.10 


0.09-0.10 


Total 0.20 0.20-0.21 


0.36 


0.36-0.36 


0.13 


0.12-0.13 


Atherosclerosis Risk in Communities (classless NRI) 










Male 1.14 1. 13-1. 16 


0.90 


0.88-0.91 


1.19 


I.I8-I.2I 


Female 0.60 0.59-0.61 


0.67 


0.66-0.67 


0.58 


0.58-0.59 


Total 0.83 0.82-0.83 


0.76 


0.75-0.77 


0.85 


0.85-0.86 



Abbreviations: NEW-CHD, New Coronary Heart Disease model; FRSvl, Version I of the Framingham Risk Score; FRSv2, Version 2 of the Framingham Risk Score; 
FRSHDL, Version I of the Framingham Risk Score without HDL; HDL, high density lipoprotein; NRI, net reclassification improvement. 



present study is that it only allows for validation of models 
containing risk factors that were available in both data sets. 
Unfortunately, one FRSv2 input, hypertension medication, 
was not used in FRSvl . Having this variable may have poten- 
tially increased the accuracy of FRSv2 beyond that shown 
in the present study; however, little evidence indicates that 
FRSv2 was significantly improved over FRSvl (ie, FRSvl 
and FRSv2 are almost equally adopted in clinical practice). 
Given this information, it is very unlikely that having the 
hypertension medication variable in the data would yield a 
significant change in the study results. 

Table 5 Hazard ratios and corresponding 95% confidence 
Intervals resulting from a Cox model of coronary heart disease 
Incidence In the Atherosclerosis Risk In Communities cohort with 
the FRSvl, Framingham equation, and six additional risk factors 
as covariates 



Covariates 


Hazard 


95% confidence 




ratio 


interval 


Framingham Risk Score 


1.23 


I.I 2-1. 35 


( 1 standard deviation) 






Framingham equation 


1.62 


1.47-1.79 


( 1 standard deviation) 






Family history (yes versus no) 


0.92 


0.45-1.87 


Body mass index (1 standard deviation) 


1.00 


0.92-1.08 


Exercise {\ow, moderate, high) 


I.OI 


0.92-1.10 


Apolipoprotein A ( 1 standard deviation) 


I.I 1 


1.04-1. 18 


Fibrinogen (1 standard deviation) 


1.15 


1.07-1.23 


Albumin (1 standard deviation) 


0.91 


0.85-0.99 



Another weakness of the present study is the small 
outcome sample size of the NHANES III data, with only 
88 CHD deaths. Although this finding did not appear to affect 
statistical significance test results, a larger sample size could 
add credibility to the results. 

A final shortcoming of the present study is that since the 
preparation of this analysis, new lipid guidelines have been 
put forth by the American College of Cardiology/ American 
Heart Association Task Force on Practice Guidelines.^' 
These new guidelines make use of a new risk model, the 
Pooled Cohort Equations.^' This limits the practical applica- 
tion of this present analysis clinically, particularly the NRI. 
However, the original intent, to test the extendibility of the 
FRS, has been met. 

Conclusion 

In summary, various statistical model accuracy indices 
have been used to demonstrate the validity of a new CHD 
assessment tool that combines six well-known risk fac- 
tors with the FRE. With a combination of discrimination, 
calibration, and reclassification, this new model has been 
shown to perform better than both Framingham score 
models, and this improvement is similar in magnitude to 
incorporating HDL into the FRSvl. This exercise should 
lend confidence to those researchers who would heed the 
challenge of developing new and improved disease predic- 
tion models. 
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